aa wrapper and KLU migration to PNM for PFs by jd-lara · Pull Request #302 · Sienna-Platform/PowerNetworkMatrices.jl

jd-lara · 2026-05-14T00:51:23Z

This PR provides usage of the AA framework in the rest of the matrices and changes to KLU to be used later in PowerFlows to drop completely KLU.jl dependency and pass onto PFs the protection against the pointer problem and will close Sienna-Platform/PowerFlows.jl#107

…etworkMatrices.jl into jd/AA_wrapper

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Replaces the optional AppleAccelerate.jl weak dependency / package extension with an in-tree AccelerateWrapper submodule that binds directly to libSparse.dylib on macOS. The wrapper exposes a cached symbolic+numeric LDLT factorization, in-place dense/sparse-RHS solves, and SpMM/SpMV; PNM's PTDF/LODF/VirtualPTDF code paths are migrated from AAFactorization to the new AAFactorCache. Non-Apple builds get a stub module that errors on use, removing the runtime extension gating.

Changes:

New src/AccelerateWrapper/ submodule with libSparse ccalls, a symbolic/numeric factor cache, and dense/sparse solve + SpMM bindings (macOS-gated; stubs elsewhere).
Inlined _calculate_PTDF_matrix_AppleAccelerate / _calculate_LODF_matrix_AppleAccelerate into src/, swapped _solve_factorization/_create_factorization/with_solver to dispatch on AAFactorCache, and replaced _has_apple_accelerate_ext() with _has_apple_accelerate_backend() = Sys.isapple().
Removed ext/AppleAccelerateExt.jl, the AppleAccelerate weakdep/compat entries, the runtests.jl install step, and the corresponding Aqua stale-dep ignore; added test/test_accelerate_wrapper.jl.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/AccelerateWrapper/AccelerateWrapper.jl`	Submodule entry; macOS-gated includes plus non-Apple stubs.
`src/AccelerateWrapper/libsparse_bindings.jl`	C struct layouts, enums, and mangled `@ccall` bindings into libSparse.
`src/AccelerateWrapper/aa_cache.jl`	`AAFactorCache` lifecycle: lower-triangle pattern, symbolic/numeric (re)factor, finalizer.
`src/AccelerateWrapper/solve_dense.jl`	In-place dense vector/matrix solves and `\` overload.
`src/AccelerateWrapper/solve_sparse_rhs.jl`	Block-packed sparse-RHS solver with reusable per-cache scratch.
`src/AccelerateWrapper/spmm.jl`	`aa_spmm!` / `aa_spmv!` bindings with per-call CSC→Apple index translation.
`src/PowerNetworkMatrices.jl`	Includes the wrapper, imports its symbols, drops AppleAccelerate forward decls.
`src/linalg_settings.jl`	Drops the extension probe; adds `_has_apple_accelerate_backend` and updates `check_linalg_backend`.
`src/solver_dispatch.jl`	Adds `with_solver` overload specialized on `AAFactorCache`; updates docstring.
`src/ptdf_calculations.jl`	Inlines PTDF AppleAccelerate path using `AccelerateWrapper`.
`src/lodf_calculations.jl`	Inlines LODF AppleAccelerate path using `AccelerateWrapper`.
`src/virtual_ptdf_calculations.jl`	Switches `_solve_factorization` to typed `AAFactorCache` overload; updates docs.
`ext/AppleAccelerateExt.jl`	Removed (logic moved into `src/`).
`Project.toml`	Drops `AppleAccelerate` weakdep, extension entry, and compat bound.
`test/PowerNetworkMatricesTests.jl`	Removes `:AppleAccelerate` from Aqua stale-dep ignore list.
`test/runtests.jl`	Drops the `Pkg.add("AppleAccelerate")` branch and updates the comment.
`test/test_accelerate_wrapper.jl`	New unit tests for `AAFactorCache`, dispatch, `with_solver`, and KLU parity.
`test/test_ptdf.jl`, `test/test_lodf.jl`, `test/test_virtual_ptdf.jl`, `test/test_powerflow_matrix_types.jl`, `test/test_network_modification.jl`	Switch guards to `_has_apple_accelerate_backend`, update messages, and update expected factor type to `AAFactorCache`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+"""
+    solve!(cache, B) -> B
+
+Solve `A · X = B` in place. `B::StridedVecOrMat{Float64}` must have
+first-dimension size equal to `cache.n` and unit stride in the first
+dimension. Multiple columns of `B` are handled in a single libSparse call.
+"""
+function solve!(cache::AAFactorCache, B::StridedMatrix{Cdouble})


-    This solver is only available on macOS.
-    Install AppleAccelerate:
-    julia> using Pkg; Pkg.add(\"AppleAccelerate\")"""
+_has_apple_accelerate_backend() = Sys.isapple()


luke-kiernan · 2026-05-14T17:28:01Z

+# Called by libSparse on failure with a null-terminated C string; we surface
+# the message via `@error`. Must be a top-level function so `@cfunction` can
+# resolve it.
+function _libsparse_report_error(msg::Cstring)
+    s = unsafe_string(msg)
+    @error "libSparse reported an error" message = s
+    return nothing
+end
+
+# `reportError` is fired by libSparse before it returns a failure status —
+# we log the message so it ends up in user output rather than libSparse's
+# own stderr. Passing libc malloc/free explicitly (the C_NULL "use Apple


Seconded. Looking at AppleAccelerate.jl, I see instead: @cfunction(text->error(unsafe_string(text)), Cvoid, (Cstring, )). Probably don't need to go that low-level, but @error seems risky.

Copilot

Pull request overview

Copilot reviewed 38 out of 39 changed files in this pull request and generated 2 comments.

@@ -383,7 +410,7 @@ efficient when the prerequisite matrices with factorization are already availabl
 - `BA::BA_Matrix`: The branch susceptance weighted incidence matrix (B * A)

 # Keyword Arguments
- `linear_solver::String = "KLU"`:
+- `linear_solver::String = _default_linear_solver()`:
        Linear solver algorithm for matrix computations. Currently only "KLU" is supported


+        col_start = pos
+        for p in nzrange(A, j)
+            if rowval[p] >= j
+                pos += 1
+                pos > cache.nnz_tri && return _pattern_mismatch(op)
+                cache.rowIndices[pos] == Cint(rowval[p] - 1) ||
+                    return _pattern_mismatch(op)
+            end
+        end
+        cache.columnStarts[j + 1] == Clong(pos) ||
+            return _pattern_mismatch(op)
+        # silence unused-warning on col_start
+        col_start === col_start


github-actions · 2026-05-14T14:34:45Z

Performance Results

Precompile Time

Main	This Branch	Delta
2.19 s	2.258 s	+3.1%

Execution Time

Test	Main	This Branch	Delta
matpower_ACTIVSg2000_sys-Build PTDF First	1.726 s	1.882 s	+9.0%
matpower_ACTIVSg2000_sys-Build PTDF Second	152.8 ms	99.4 ms	-34.9%
matpower_ACTIVSg2000_sys-Build Ybus First	397.0 ms	14.7 ms	-96.3%
matpower_ACTIVSg2000_sys-Build Ybus Second	13.3 ms	13.1 ms	-1.7%
matpower_ACTIVSg2000_sys-Build LODF First	153.6 ms	587.4 ms	+282.4%
matpower_ACTIVSg2000_sys-Build LODF Second	170.3 ms	151.9 ms	-10.8%
matpower_ACTIVSg2000_sys-Build VirtualMODF First	3.496 s	4.442 s	+27.1%
matpower_ACTIVSg2000_sys-Build VirtualMODF Second	66.1 ms	543.3 ms	+721.8%
matpower_ACTIVSg2000_sys-VirtualMODF Query 10 rows	531.3 ms	502.9 ms	-5.3%
matpower_ACTIVSg2000_sys-Radial network reduction First	437.8 ms	449.4 ms	+2.6%
matpower_ACTIVSg2000_sys-Radial network reduction Second	0.7 ms	0.7 ms	-1.1%
matpower_ACTIVSg2000_sys-Degree two network reduction First	1.64 s	1.691 s	+3.1%
matpower_ACTIVSg2000_sys-Degree two network reduction Second	1.1 ms	1.1 ms	-6.3%
Base_Eastern_Interconnect_515GW-Build Ybus First	4.629 s	3.591 s	-22.4%
Base_Eastern_Interconnect_515GW-Build Ybus Second	3.206 s	3.093 s	-3.5%
Base_Eastern_Interconnect_515GW-Radial network reduction First	39.7 ms	42.0 ms	+6.0%
Base_Eastern_Interconnect_515GW-Radial network reduction Second	40.1 ms	158.3 ms	+295.0%
Base_Eastern_Interconnect_515GW-Degree two network reduction First	354.4 ms	370.0 ms	+4.4%
Base_Eastern_Interconnect_515GW-Degree two network reduction Second	44.3 ms	41.6 ms	-6.1%

luke-kiernan

Big picture question: what's the motivation or goal here? Yeah, using the public Julia package has its issues and limitations, but it's also modular and maintained in tandem with the _jll binary. So I'd like to hear your reasoning here.

If there's specific shortcomings of these libraries, I'm willing to go open a PR. I expect it wouldn't get merged for a while (~months), but at least then we wouldn't need to maintain our own low-level bindings indefinitely.

My other question: why is AA no longer an extension?

luke-kiernan · 2026-05-14T17:11:18Z

+    factorization_type::SparseFactorization_t = SparseFactorizationLDLT,
+)
+    n = size(A, 1)
+    n == size(A, 2) || throw(DimensionMismatch("matrix must be square; got $(size(A))"))


If the user tries to apply a symmetric factorization to a non symmetric matrix, this will silently take the lower triangle and factorize a different matrix...potentially hazardous. I'd at least check that the pattern is symmetric, with a flag to bypass.

luke-kiernan · 2026-05-14T17:13:43Z

+Release the libSparse numeric and symbolic handles held by `cache`, leaving
+Julia-side state intact. Idempotent.
+"""
+function _free_handles!(cache::AAFactorCache)


And if it isn't SparseStatusOk? Leaving it around seems iffy. Unlikely to cause issues in practice, but could clog up the heap with un-GCed objects in theory.

luke-kiernan · 2026-05-14T17:28:01Z

+# Called by libSparse on failure with a null-terminated C string; we surface
+# the message via `@error`. Must be a top-level function so `@cfunction` can
+# resolve it.
+function _libsparse_report_error(msg::Cstring)
+    s = unsafe_string(msg)
+    @error "libSparse reported an error" message = s
+    return nothing
+end
+
+# `reportError` is fired by libSparse before it returns a failure status —
+# we log the message so it ends up in user output rather than libSparse's
+# own stderr. Passing libc malloc/free explicitly (the C_NULL "use Apple


Seconded. Looking at AppleAccelerate.jl, I see instead: @cfunction(text->error(unsafe_string(text)), Cvoid, (Cstring, )). Probably don't need to go that low-level, but @error seems risky.

luke-kiernan · 2026-05-14T19:05:07Z

+# Status codes and shared error helper
+# ---------------------------------------------------------------------------
+
+const KLU_OK = 0


nitpick: int backed enum would be more stylistic.

luke-kiernan · 2026-05-14T19:25:15Z

 # Keyword Arguments
 - `linear_solver::String = "KLU"`:
-        Linear solver algorithm for matrix computations. Currently only "KLU" is supported
+        This constructor is intentionally KLU-only because `ABA.K` is always a


Reasoning behind this decision? Which matrices are AA-compatible and why?

jd-lara · 2026-05-14T19:35:58Z

Big picture question: what's the motivation or goal here? Yeah, using the public Julia package has its issues and limitations, but it's also modular and maintained in tandem with the _jll binary. So I'd like to hear your reasoning here.

If there's specific shortcomings of these libraries, I'm willing to go open a PR. I expect it wouldn't get merged for a while (~months), but at least then we wouldn't need to maintain our own low-level bindings indefinitely.

My other question: why is AA no longer an extension?

In short, I am trying reduce the number of external dependencies under which we have no control specially around the linear solvers and expose only the code that we need. I have been frustrated with the speed at which the other libraries respond and also at the exposure of mistakes there like it happened with the solve! call.

Another reason specially for the linear solvers is to have cohesive infrastructure for PNM and PFs.

josephmckinsey

I'm mainly confused as to why we are making so many changes to KLU immediately again. I haven't dug into the Apple Accelerate part yet.

josephmckinsey · 2026-05-14T20:54:12Z

What feature is this used for? I don't actually see any specific code for KLU, so it seems like it could potentially be used for any LinearAlgebra.Factorization.

josephmckinsey · 2026-05-14T20:56:29Z

I'm a bit confused why we are wrapping this. KLU's internals weren't very thread-safe, which made it a clearer case until that can be fixed upstream, but I thought AppleAccelerate.jl was a bit better now?

jd-lara · 2026-05-14T21:13:20Z

I'm mainly confused as to why we are making so many changes to KLU immediately again. I haven't dug into the Apple Accelerate part yet.

Maybe in not the cleanest way I layered the changes needed in KLU here so we can remove the use of KLU.jl in PowerFlows and use our own wrapper. In PFs we use Int32

luke-kiernan · 2026-05-22T20:18:06Z

+            arc_keys = spread_keys(PNM.get_arc_axis(vptdf_ref), N_ROWS)
+            sample = function ()
+                v = PNM.VirtualPTDF(sys; linear_solver = solver)
+                GC.gc()


Why the explicit call to garbage collection here? Elsewhere in this file too.

luke-kiernan · 2026-05-22T20:25:04Z

    end
-    _populate_lower_triangle_pattern!(cache, A)
+    _populate_pattern!(cache, A)
    sym = _sparse_symbolic_factor(


nitpick: maybe rename this variable to something other than sym now that it's not a symmetric matrix.

luke-kiernan · 2026-05-22T20:34:54Z

+    # Workspace-aware overload — caller-supplied scratch avoids a per-call
+    # malloc/free inside libSparse.
+    ws = _ensure_solve_workspace!(cache, size(B, 2))
+    GC.@preserve cache _sparse_solve_matrix_ws!(cache.numeric, _dense_matrix(B), ws)


Is the GC.@preserve here precautionary or necessary?

luke-kiernan · 2026-05-22T20:37:38Z

+    Tv = eltype(X)
    fill!(X, zero(Tv))


Nitpick: pretty sure just fill!(X, 0.0) will work. And if not, then introduce Tv as a type parameter.

luke-kiernan · 2026-05-22T20:42:49Z

+# libSparse rejects factorization type 80.
+const _AA_MIN_MACOS = v"15.5"
+
+# Query the running macOS product version via the `kern.osproductversion`


AppleAccerelate.jl has a very similar function. Could just copy-paste their implementation. [Or do you have reasons for not using theirs?]

luke-kiernan · 2026-05-22T20:44:15Z

+VirtualPTDF / VirtualLODF / VirtualMODF constructors.
 """
-_default_linear_solver() = Sys.isapple() ? "AppleAccelerate" : "KLU"
+function _default_linear_solver()


Nitpick: might belong better in __init__ or in a similar compile-time "run just once" spot.

luke-kiernan · 2026-05-22T22:11:33Z

+    _get_PTDF_A_diag(K, BA, A, ref_bus_positions) -> Vector{Float64}
+
+Compute `diag(PTDF · A)`. Each row of `A` has exactly two nonzeros (+1 at the
+from-bus, -1 at the to-bus), so the per-arc dot product reduces to two indexed
+reads into the solved PTDF row after a one-time transpose of `A`.


Pre-existing issue, but: is there really no better way to do this? I have a sense of deja vu, as if I've gone down this path before and it went nowhere.

edit: okay this has lower memory usage than pre-computing the entirety of K^{-1} (which could be dense). I'm guessing that's why we pick this method.

jd-lara added 4 commits May 13, 2026 13:25

WIP use AA natively

0b0f41d

WIP use AA natively

847c169

Merge branch 'jd/AA_wrapper' of https://github.com/NREL-Sienna/PowerN…

e318278

…etworkMatrices.jl into jd/AA_wrapper

assess copilot's comments and simplifications

09f9cd4

jd-lara requested a review from Copilot May 14, 2026 00:51

Copilot AI reviewed May 14, 2026

View reviewed changes

Copilot started reviewing on behalf of jd-lara May 14, 2026 00:59 View session

jd-lara added 5 commits May 13, 2026 21:11

use AA by default in apple

eb24e85

encapsulate better

b1651ac

enable AA on all matrices

c3d1f9f

improve indexing performance

3151464

add testing for other matrices

cd6dd10

jd-lara changed the title ~~aa wrapper~~ aa wrapper and KLU migration to PNM for PFs May 14, 2026

extend KLU wrapper for Int32

3fe6b15

jd-lara requested a review from Copilot May 14, 2026 04:17

jd-lara marked this pull request as ready for review May 14, 2026 04:18

jd-lara requested a review from luke-kiernan May 14, 2026 04:18

Copilot started reviewing on behalf of jd-lara May 14, 2026 04:18 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

jd-lara added 2 commits May 14, 2026 00:24

resolve PR comments

b7a11d6

fix docstring creation

bba375e

jd-lara requested a review from josephmckinsey May 14, 2026 14:31

add a fix for documentation

164c28d

luke-kiernan reviewed May 14, 2026

View reviewed changes

Review comments fixes

1dc0889

josephmckinsey reviewed May 14, 2026

View reviewed changes

generalize refinement

f73b6fd

jd-lara requested review from josephmckinsey and luke-kiernan May 16, 2026 18:07

change to use AA_LU

fbabab4

jd-lara mentioned this pull request May 20, 2026

make ptdf diag lazy #304

Merged

jd-lara added 2 commits May 20, 2026 18:22

Merge remote-tracking branch 'origin/main' into jd/AA_wrapper

3b56cb6

add more testing for N-2 and N-3

1af918f

luke-kiernan approved these changes May 22, 2026

View reviewed changes

Conversation

jd-lara commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Results

Precompile Time

Execution Time

Uh oh!

luke-kiernan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jd-lara commented May 14, 2026

Uh oh!

josephmckinsey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jd-lara commented May 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jd-lara commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading